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Abstract 

Methods based on hypothesis tests (HTs) in the Haar domain are widely used to denoise Poisson 
count data. Facing large datasets or real-time applications, Haar-based denoisers have to use the 
decimated transform to meet limited-memory or computation-time constraints. Unfortunately, 
for regular underlying intensities, decimation yields discontinuous estimates and strong "stair- 
case" artifacts. In this paper, we propose to combine the HT framework with the decimated 
biorthogonal Haar (Bi-Haar) transform instead of the classical Haar. The Bi-Haar filter bank 
is normalized such that the p-values of Bi-Haar coefficients (pbh) provide good approximation 
to those of Haar (ph) for high-intensity settings or large scales; for low-intensity settings and 
small scales, we show that pbh are essentially upper-bounded by pu- Thus, we may apply 
the Haar-based HTs to Bi-Haar coefficients to control a prefixed false positive rate. By doing 
so, we benefit from the regular Bi-Haar filter bank to gain a smooth estimate while always 
maintaining a low computational complexity. A Fisher-approximation-based threshold imple- 
menting the HTs is also established. The efficiency of this method is illustrated on an example 
of hyperspectral-source-flux estimation. 
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1. Introduction 

Astronomical data analysis often requires Poisson noise removal [1] . This problem can 
be formulated as follows: we observe a gr-dimensional (qD) discrete dataset of counts 
v = (vi)i e zi where i>, follows a Poisson distribution of intensity Ai, i.e. m ~ V(Xi). Here 
we suppose that Vi's are mutually independent. The denoising aims at estimating the 
underlying intensity profile A = (Xi)iezi from v. 

A host of estimation methods have been proposed in the literature (see the reviews [2] [3] 
and their citations), among which an important family of approaches based on hypothesis 
tests (HTs) is widely used in astronomy [4,5] [6]. These methods rely on Haar transform 
and the HTs are applied on the Haar coefficients to control a user-specified false positive 
rate (FPR). When working with large datasets or real-time applications, the decimated 
Haar transform is generally required to meet limited-memory or computation-time con- 
straints. This is even more true when processing astronomical hyperspectral data, which 
are usually very large in practice. Unfortunately, for regular underlying intensities, deci- 
mation yields discontinuous estimates with strong "staircase" artifacts, thus significantly 
degrading the denoising performance. Although [7] and [8] attempted to generalize the 
HTs for wavelets other than Haar, [7] is more computationally complex than Haar-based 
methods, and [8] adopts an asymptotic approximation which may not allow reasonable 
solutions in low-count situations. In an astronomical image decompression context, [9] 
has also proposed to remove Haar block artifacts by minimizing at each resolution level 
the £ 2 -norm of the Laplacian of the solution under some constraints on its wavelet co- 
efficients. It has been shown that this approach was efficient in removing the artifacts, 
but it requires solving J minimization problems, where J is the number of scales. This 
can be quite time-consuming and would limit the interest in using Haar for large-dataset 
analysis. 

In this paper, we propose to combine the HT framework with the decimated bi- 
orthogonal Haar (Bi-Haar) transform. The Bi-Haar filter bank is normalized such that 
the p- values of Bi-Haar coefficients (pbh) approximate those of Haar (j>h) for high- 
intensity settings or large scales; for low-intensity settings and small scales, we show that 
Pbh are essentially upper-bounded by pn- Thus, we may apply the Haar-based HTs to 
Bi-Haar coefficients to control a prefixed FPR. By doing so, we benefit from the regular 
Bi-Haar filter bank to gain a smooth estimate. A Fisher-approximation-based threshold 
implementing the HTs is also established. We find that this approach even exhibits a 
performance comparable to the more time/space-consuming translation-invariant Haar 
(TI Haar or undecimated Haar) denoising in some of our experiments. The efficiency of 
this method is also illustrated on an example of hyperspectral-source-flux estimation. 

The paper is organized as follows. We begin with the review of the wavelet HTs in 
Section 2, and then Bi-Haar domain tests are presented in Section 2.2. Section 2.3 details 
some thresholding operators implementing the tests. The final denoising algorithm is 
summarized in Section 2.4, and the numerical results are shown in Section 3. We conclude 
in Section 4, and the mathematical details are deferred to the appendices. 
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2. Hypothesis testing in the wavelet domain 

Wavelet domain denoising can be achieved by zeroing insignificant coefficients while 
preserving significant ones. We detect significant coefficients by applying a binary HT on 
each wavelet coefficient d: 

H :d = 0vs.H 1 :d^0 

Note that since any wavelet has a zero mean, if d comes from a signal of constant intensity 
within the wavelet support, then d € H . 

Individual HTs are commonly used to control a user pre-specified FPR in the wavelet 
domain, say a. The tests are carried out in a coefficient-by-cocfficicnt manner. That is, 
the p- value of each coefficient pi is calculated under the null hypothesis H n . Then, all 
the coefficients with pi > a will be zeroed. If we desire to control global statistical error 
rates, multiple HTs may be adopted such as Bonferroni correction which controls the 
Family- Wise Error Rate (FWER), and the Bcnjamini and Hochberg procedure [10] [11] 
controlling the false discovery rate (FDR). 

2.1. p-values of wavelet coefficients under H 

To carry out HTs, we need to compute the p- value of each wavelet coefficient under 
Hq. Although the probability density function (pdf) of a iJo-coefficicnt has been derived 
in [7], this pdf has no closed form for a general wavelet. Thus the p- value evaluation in 
practice is computationally complex. 

To obtain distributions of manageable forms, simple wavelets are preferred, such as 
Haar. To the best of our knowledge, Haar is the only wavelet yielding a closed-form 
pdf which is given by [12] (n > 0): Pr(d = n; A) = e~ 2A i„(2A), where d = X x - X 2 , 
Xi,X 2 ~ 'P(X), and /„ is the n-th order modified Bessel function of the first kind. For 
negative n, the probability can be obtained by symmetry. The tail probability (p-value) 
is given by [13]: 

Pr(d>n;A)=Pr( X 2 2n) (2A)<2A), n>l (1) 

where xf /) (A) is the non-central chi-square distribution with / degrees of freedom and 
A as non-centrality parameter. 

2.2. Bi-Haar domain testing 

Haar wavelet provides us with a manageable distribution under H . But due to the lack 
of continuity of Haar filters, its estimate can be highly irregular with strong "staircase" 
artifacts when decimation is involved. 

To solve this dilemma between distribution manageability and reconstruction regular- 
ity, we propose to use the Bi-Haar wavelet. Its implementation filter bank is given by 
[1]: 
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fc = 2- c [l,l], ff = 2- c r[^,-l,l,-|,-|]; 

/i = 2 c - 1 r[-w,l,l,i -k 5 = 2 C - 1 [1,-1] 



where c and r = (1 + 2~ 5 )~ 1 / 2 are normalizing factors, (ft, <?) and (ft, g) are respectively 
the analysis and synthesis filter banks. Note that our Bi-Haar filter bank has an unusual 
normalization. The motivation behind this is to ensure that the Bi-Haar coefficients will 
have the same variance as the Haar ones at each scale. Let us also point out that to 
correct for the introduction of the factor r, the Bi-Haar coefficients must be multiplied 
by r _1 at each stage of the recursive reconstruction. For comparison, the Haar filter 
bank is (ft = 2- c [l,l], g = 2" c [-l,l], ft = 2 C " 1 [1,1], g = 2 C - 1 [1,-1]). It follows that 
the synthesis Haar scaling function is discontinuous while that of Bi-Haar is almost 
Lipschitz [14] [15]. Hence, the Bi-Haar reconstruction will be smoother. 

At scale j > 1, let us define Xj = 2 J A where A is the underlying constant intensity. Then, 
a Haar coefficient can be written as dj = 2~ c i{X\ — X 2 ) where X\,X 2 ~ V(Xj/2) are 
independent. We note pn '■= Pr(d' 1 > 2~ c: > ko\Ho) to be the p- value of a Haar coefficient 



where fco = 1, 2, • • •. Accordingly, a Bi-Haar coefficient can be written as df = 2~ c ir{X i 



3" 



X A + \{X X - X 2 )), where X 1 ,X 2 ~ V(Xj) and X 3 ,X 4 - V(Xj/2) are all independent. 
We note pbh '■= Pr(d*' 1 > 2~ c; >ko\Ho) to be the p-value of a Bi-Haar coefficient at the 
same critical threshold as for pn- These definitions can be extended to higher dimensions 
(q > 1) straightforwardly. 

For high-intensity settings or for large scales, dj and dfj h will be asymptotically normal 
with the same asymptotic variances a\ = a\ h = 2 9J ( 1 ~ 2c ) A due to the normalized filter 
banks. Thereby, they will have asymptotically equivalent tail probabilities, i.e., pbh ~ 
Ph- 

For low intensity settings (A <C 1) and small scales, the following proposition (proof in 
Appendix A) shows for ID signals that pbh is essentially upper-bounded by pn under 
H a . The bounds for multidimensional data (q > 1) are also studied in Appendix A. 
Proposition 1 We have the following upper-bound for ID signals 



Pbh < Ph + A(Xj)(l - 2p H ) (2) 



where 



A(X j 



e- 2 ^ 7 (2A j ) + 2^7 ro (2A J ) 



m—1 



AsX^0+, A(A i ) = f^A 9 + (A 9 ). 

This theoretical bound is clearly confirmed by the numerical simulations shown in Table 1. 
Here we show the results for Aj £ [10 -1 ,10 2 ] and different critical thresholds fco at the 
tails of the distributions. We indeed observe that pbh is always strictly smaller than ph- 
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Table 1: pn and pbh 
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Every parenthesis shows (ph,Pbh) for ID signals, where we always observe that pbh < PH- 



2.3. Thresholds controlling FPR 

For individual tests controlling FPR, the HTs can be implemented by thresholding 
operators. In other words, one can find tj such that Pr(|d* ft '| > tj\H a ) < a where a rep- 
resents the controlled FPR. Now consider the Haar case and suppose that we have derived 
the Haar threshold tj under the controlled FPR. Then, by setting tj := 2-°^\2 c Hj\ the 
results in Section 2.2 allow us to conclude that the FPR for a Bi-Haar test will always 
be upper-bounded by a. We point out that to simplify the presentation, tj and tj are 
supposed to be scale-dependent only, but scale and location-dependent thresholds can 
be derived using the same procedure presented below. 



2.3.1. CLTB threshold [4,8,5,6] 

The Haar coefficient for qD data can be written as dj = 2~ c 3 q {X\ — X 2 ) where Xi,X 2 ~ 
V(Xj/2) are independent. It follows from (1) that: 

Pr(dJ > t,\Hn) = Pr (x?2 roj )ft) < A.,) » Prftxy) < A,) (3) 

where rrij = 2 cjq t :j , 7 = (2m j + 2Xj)/(2mj + Xj), f = (2m j + X j ) 2 /(2m :j + 2Xj), xf v) 
is a central chi-square variable and Z <~ Af(0, 1). Here, two stages of approximation are 
used: 1) the non-central chi-square distribution is first approximated by a central one (3) 
[16]; 2) the central chi-square variable is then approximated by a normal one (4) using 
the central limit theorem (CLT). tj is thus called the CLT-based (CLTB) threshold. 
Consequently, it remains to solve the equation (4) — a/2, and the solution is given by: 

h = S-^ 1 (z 2 a/2 + y/*i /2 +4-\ S zl /2 ) (5) 

where z a j 2 = <E>- 1 (1 — a/2), and $ is the standard normal cdf. Universal threshold can 
also be obtained by setting z a / 2 = ^J2 In Nj in (5) where Nj is the total number of 
coefficients in one band at scale j. 
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2.3.2. FAB threshold 

An improvement of CLTB threshold can be achieved by replacing (4) with an approx- 
imation of faster convergence, e.g., the following one proposed by Fisher [17]: 



y2^^AA(v/27^I,l), f^oo 
Therefore, (4) is changed to: 

^ (ix\ f) < A,-)* Pr [z > VW—1 - ^ 
Let us denote: 



(6) 



(7) 



G{m,j) := V2/-1 



'2A,- 



(2mj + Aj) 2 



1 



, A j (2m J -+A J -) 



7 y mj + Aj Y m i + A J 

It remains to solve G(nij) — z a / 2 , which leads to a quartic equation in mf 



(8) 



16mj + 



16A, - 8(z 2 a/2 + 1) 



+1)2 _ (20^ +i2) Aj+ 4A 



+ 



2(4/2 + 1) 2 A, - 16^ /2 A 2 - 4A 2 



"»j + (4/2 + 1) 2 A? - 4z 2 /2 Af = 



(9) 



The final Fisher-approximation-based (FAB) threshold tj is obtained from m*, the so- 
lution of (9). Owing to the following results, we do not need to write out the explicit 
expression of m*, which could be rather complex: 

Proposition 2 The feasible condition for mj is given by (10), and the feasible solution 
m* exists and is unique. 



1 



Z a/2 



2A, + 1 + (4/ 2 + (12A, + 2)4 /2 + 4A 2 + 12A j + 1 



1/2 



(10) 



Proposition 2 implies that we can use any numerical quartic-equation solver, e.g. Hacke's 
method [18], to find the four solutions of (9). One and only one of the solutions will 
satisfy (10), which is m*. The universal threshold can also be derived in the same way 
as in the CLTB case. 



2.4. Summary of the denoising controlling FPR 

Note that the thresholds tj depend on the background rate at scale j (i.e. Xj). Without 
any prior knowledge, it can be estimated by the values of the approximation coefficients at 
scale j + 1 (i.e. aj + \ ) . Here, the wavelet denoising should be carried out in a coarse-to-fine 
manner, outlined as follows: 
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Algorithm 1 Poisson noise removal by HTs in the Bi-Haar domain 

1: Bi-Haar transform of v up to j = J to obtain aj and dj h (1 < j < J) 

2: for j = J down to 1 do 

3: Aj = 2 J9 A if A is known; otherwise Xj — max^^Oj, 0) 

4: Testing dj by applying thresholds tj for a prefixed FPR = a 

5: Reconstruct Oj_i by inverse Bi-Haar transform 

6: end for 

7: Positivity projection: A = max(ao, 0) 



3. Results 



3.1. Haar vs. Bi-Haar denoising for regular intensities 



To compare Haar and Bi-Haar denoising for regular intensities, we generate noisy 
signals from the "Smooth" function [2] (sec Fig. 1(a)) and measure the Normalized Mean 
Integrated Square Error (NMISE) per bin from the denoised signals. The NMISE is 
defined as: NMISE := — Ai) 2 /Ai)/A?], where (Aj)j is the intensity estimate. 

Note that the denominator Aj plays the role of variance stabilization in the error measure. 

Fig. 1(a) shows the denoising examples given by Haar, Bi-Haar and TI Haar estima- 
tions, where FAB thresholds are applied to control a FPR a = 1CP 3 . The original intensity 
function is scaled to cover a wide range of intensities, and Fig. 1(b) compares the NMISEs 
(measured from 100 replications) of the three estimators as functions of the underlying 
peak intensity. 

It can be seen that the Bi-Haar estimate is much more regular than the Haar one, and 
is even almost as good as TI Haar at every intensity level under the NMISE criterion. 
This surprising performance is gained with the same complexity as in the Haar denoising, 
i.e., O(N) only, as opposed to O(AHogAf) in the TI Haar case. 




Fig. 1. Denoising the "Smooth" function (length = 1024). Estimates from Haar, Bi-Haar and TI Haar 
(undecimated) are compared, a = 10~ 3 and J = 7. (a) denoising results; (b) NMISEs. 



7 



3.2. Source- flux estimation in astronomical hyper spectral data 

We apply our method to source- flux estimation in astronomical hyperspectral images. 
A hyperspectral image v(x, y, v) is a "2D+1D" volume, where x and y define the spatial 
coordinates and v indexes the spectral band. Each bin records the detected number of 
photons. As the three axes of our data have different physical meanings, we are motivated 
to apply a "2D+1D" wavelet transform instead of using the classical 3D transform. That 
is, we first carry out a complete 2D wavelet transform for spatial planes, and then a 
ID transform along the spectral direction. We use j xy and j v to denote the j-th spatial 
scale and the j-th spectral scale, respectively. Hyperspectral data in practice can be 
very large, implying that fast dcnoising is only possible with decimated transforms (the 
execution time of the example below on a P4 2.8GHz PC is 13s for our Bi-Haar denoising, 
i.e., more than 50 times faster than the TI Haar denoising (665s)), not to mention the 
memory space required by the redundant TI transform. 

Our simulated data contain a source having a Gaussian profile. The source amplitude 
A v decreases from 2 to 10~ 4 as v increases. One example band is shown in Fig. 2(a). 
The observed counts at that band are depicted in Fig. 2(d). The denoising results using 
Haar and Bi-Haar transforms are respectively shown in Fig. 2(b) and (e), where FAB 
thresholds are applied. Fig. 2(c) illustrates the estimation smoothness gained by Bi-Haar 
by comparing a line profile of the estimated source from different methods. In hyper- 
spectral imaging, the source flux Siy) is an important quantity, which equals to the 
integral of the source intensity over its spatial support at band v. Fig. 2(f) compares the 
flux given by different denoisers. Clearly, the Haar-based approach leads to a piecewise 
constant estimate, whereas Bi-Haar provides a regular flux which is more accurate: the 
normalized £ 2 -loss for Haar and Bi-Haar flux estimates, i.e. -^\\S — S\\f2, are 14.4 and 
7.7 respectively. 

4. Conclusion 

In this paper, we proposed to combine the HT framework with the decimated Bi-Haar 
transform instead of the classical Haar for denoising large datasets of Poisson counts. 
We showed that the Haar-based individual HTs can be applied to Bi-Haar coefficients 
to control a prefixed FPR. By doing so, we benefit from the regular Bi-Haar filter bank 
to gain a smooth estimate with no "staircase" artifacts, while always maintaining a low 
computational complexity. A Fisher-approximation-based threshold implementing HTs 
is also designed. This approach could be extended in the future to fast deconvolution of 
Poisson data. 

Appendix A. Proof of Proposition 1 
PROOF. We note that 

PH = Pr(4 > 2- c ^ |#o) = E e ~ A3/ fc( A j) = E e ~ XjI \k\( X i) 

k>ko k< — ka 

where ko > 1. The p- value of oIbh is given by 
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(a) (b) (c) 




(d) (e) (f) 

Fig. 2. Source-flux estimation in a hyperspectral image (size: 129 X 129 X 64). A v € [10 _4 ,2]; J xy = 3, 
J„ = 5, FAB thresholding with a = 10~ 5 . (a) intensities at v = 15; (b) Haar-denoised data [y = 15); (c) 
estimated source profile at v = 15 (intensity along a line passing through the source center); (d) Poisson 
count image; (e) Bi-Haar-dcnoiscd data (v = 15); (f) estimated flux (Respectively for Haar and Bi-Haar 
estimates: -^=||S - S\\ e 2 = 14.4 and 7.7.) 

p BH = Pr(Xi - X 2 + 8(X 3 - X 4 ) > \8k /r] \H ) 

oo 

= ^Pr(X 3 - X 4 = k\H ) Y - X 2 = n - 8k\H ) 

fe£Z n=\8k /r~] 

where A\ 2 ~ 'P(^j), X 3 , 4 <~ V(Xj/2), and arc independent. Now we have, 

PBH = Y Pr (^3 ~X 4 = k\H ) ■ 

k>ko 

OO 

Y Pr(X 1 -X 2 =n- 8k\H ) + Prpfi - X 2 = n + 8fc|ff ) 

n=r8fe /rl 

00 

+ Y Pr(X 3 - X 4 = k\H ) Y P*{X 1 -X 2 = n-8k\H ) (A.l) 

|fc|<fe n=[8fe /rl 

9 



<PH+ E e ~ X3l \k\( X j) E e- 2A ^|„_ 8fc |(2A,) 

|fc|<fe n=[Sk a /r] 



To bound T, we use the identity [19] e x = Iq{x) + 2^^Lj In{x). As r < 1, we have 



T< e -^]T/„(2A,) = i 



n>9 



l-e- 2A ^ / (2A J ) + 2^7 m (2A j ) 



=: A(Xj ) 



Thus, < j3 ff + A(Aj)(l-2]3 ff ). As A -> 0+, we have that 4(Aj) 



28:55 



A 9 + o(A 9 ). □ 



The same arguments can be carried out to bound pbh in multi-dimensional cases. 
As an example, let us consider 2D data. A 2D wavelet transform will produce bands 
of horizontal, vertical and diagonal detail coefficients, i.e., dj-H, dj-y, and dj-o- For 
horizontal and vertical coefficients, we have that pbh < Ph + -<4(Aj)(l — 2ph), where 
Xj := 4PX. For diagonal coefficients, it can be shown that pbh < Ph + B(\j)(l — 2pn), 
where 



B(X 3 ) :- 



64 



1- E E e- 8A ^| fc |(4A,)/ |n _ 8fc| (4A,) 



n— — 64 k— — oo 



(A.2) 



To see the behavior of B(Xj) as the intensity becomes small, we note B(Xj) = Bk(Xj) + 
eK- Here, Bk is given by (A.2) with k ranging from — K to K, and ex is the residual 
which can be made arbitrary small as K increases. Then, we have for all K > 8 that 
Bk{Xj) = jr!^A 9 + o(A 9 ). Clearly, this procedure can be continued for higher dimensional 
cases (q > 2). 

Appendix B. Proof of Proposition 2 



PROOF. The facts that G(m,j) = z a/2 , z a/2 > 0, 2/ - 1 > 0, mj > and Xj > show 
(10). 

Next, when the equality in (10) holds, we have: 



A, 



G(mj) = Jzl /2 + ^-(zl /2 + 1) - \h^K /2 + 1) < z a/2 



The existence and uniqueness of the feasible solution follow from the fact that G is a 
strictly increasing function under (10), and that G(nij) — > +oo as rrij — > +oo. □ 
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